o 

CM 





5 3= 



Si 

5 





O 
O 
CM 



o 






IUOJ 


ok 


der 




o 




o 

0 


en 


Re 










o 

CM 



o 

CM 




CD ^ 

< o c 

V. CD C 



fy 
0 
ffi 

Ill 

s 

0 

03 

o 

w 
o 




p 


3 


E 


T3 


O 
>» 


•+-> 


< 


CO 


ys 




0_ 







CM 

5 



5 




£ to 
to g> 

O) > 




u O o 

< 




a) a) 

cd CO 
< 



CD 
CO 

CO i_ 

a> iS £ 

to CO co 

ID Q CO 



ACQ 



c <D to 

a> "O co 

§1 

CO 

Q 



(St 



7 



a) 
c 
o 
O 



CD 

to 

CO 
-Q 

ro 

TO 

Q 



CD 
CD 

CO 




CO 
LO 




a) 

to 

C CO s_ 

O -Q CD 

c 5 £ 

O CO CD 

O Q CO 



S> © 0> « 




a) 2 

si 

B E 
Q < 



Figure 8 



356 




Remote Content Crawler 



360 



Remote Content 
Crawler Processor 



361 



Crawling 
Criteria 
Processor 



362 



Crawling 
Criteria 
Database 



363 



Crawler 
Content 
Provider 
Processor 




364 



Crawler 
Provider 
Database 



365 



Network Resource 
Processor 



369 



7" 



Network 
Resource 
Database 



Network Crawler 



Crawling 
Servers 



366 



366a 



367 



T 

Metadata 
Acquisition 
Processor 



368 



Content Crawler 
Results Processor 



351 



Search Engine 
Processor 



517 / 




515 



Content / 


Database 


Server 










Aggregator 






Remote 






Content 






^Database^ 





Figure 9 a 



Remote Content Crawling 




Aggregation 




603 



Crawling 
Criteria 
Builder 




Content 
Provider 
Builder 



Crawling Results 
Processing 



615 



Metadata 
Retriever 
and Router 



600 



Figure 9b 



Crawl Execution 



Data 
Storage 



631 



HTTP 
Download 



Crawl 
Initiation 



633 



Crawling 
Criteria 
Checker 




635 




637 



Resource Record and 
End-of-File Identifier 



603 



Figure 10 



876 



875 



Interface with 

System 
Administrator 



888 



Initiate Crawling 
Support Processes 



877 



878 



Aggregate List of 
Network Resources 
to be Crawled 



Build and Maintain 
the Database of 
Crawling Criteria 



880 



879 



Build and Maintain 
the Database of 
Content Providers 



Route Network Resources, Search 

Criteria, and Content Provider 
Data to the Network Crawler and 
Initiate Network Crawling Process 



881 



Crawl ail known Network 
Addresses and Domains for 

Pages and Data that meet 
Crawling Criteria and Return 
Results to Results Processor 



885 



883 



Analyze Remote File 
and Streaming 
Metadata 



Process Crawling 
Results 



884 



Return Metadata 
Acquisition Results 
to Results Processor 



886 



Return Content 
Provider Data to 
Content Provider 
Database 



887 



Forward Crawling 
v Results to 
Aggregator Remote 

Content Database 



Figure 11a 



890 



891 



892 



893 



895 



881 



Receive and Cache Data 
sets from the Network 
Resource, Crawling 
Criteria, and Content 
Provider Processors 



SI/ 



Exchange Data with 
System Administrator and 
Initiate Crawling Process 



^ Divide Resource List 
According to Number of 
Crawling Servers and Route 
Lists to each server 



Load Next Resource 
Record Address on the 
List 



v From 900 



894 



908 



Has the end 
of the Resource 
Record listing been 
reached? 



Yes 



"\ 

Log Data About the 

Crawl with the 
System 

Administrator 



896 



No 


r 


Initiate HTTP 
Requests to 
Download Hypertext 
Webpage 








y 

r 


Download and 
Cache Target 
Hypertext Webpage 





885 



Process Crawling 
Results 



From 902 



*> To 897 



Figure lib 



881 



From 896 



897 



882 



Does 
Hypertext of 
Page Meet 
Crawling 
. Criteria? 



Yes 



Forward Hypertext 
and URL of Page to 
Results Processor 



899 



^ Analyze HTML 
Structure and 

Identify and Cache 
all Hyperlinks 



904 



To 893 « 




No 



Yes 




/ 901 


Access Next Cached 
Hyperlink 











To 896 < 




)oes Hyperlink^ 
Target meet 
content type 
.requirements?. 



Yes 



907 



905 



Forward Hyperlink Data 

and Address to 
Metadata Acquisition 
Processor 



Did 
Hypertext of 
Webpage meet 
Crawling 
. Criteria? 



Yes 



Forward Text and / 
Link Data to Crawling 
Results Processor 



882 



